Sears Roebuck Kit Home Report

M. Fawcett December 23, 2021

Between 1908 and 1942, the Sears Roebuck company sold houses in the form of build-it-yourself kits. The kits, huge, were prepared at factories located mostly in Illinois, and shipped via a railroad freight car to customers all over the country. Each kit contained all the materials needed to build a house except the foundation hole. Customers lugged their 25 tons of numbered precut lumber, shingles, wall board, flooring and so on, to a building site, and got to work, following the instructions in the 75 page construction guide.

This unlikely business concept was successful and resulted in sales of between 70,000 and 100,000 kits. Plus, as a mail-order marketer of all manner of merchandise, Sears sold tools used to build the houses, and then the appliances, furniture and fixtures that filled them when they were completed.

Their success was cut short when the Great Depression and World War II took away most of the demand for new housing. After World War II a new form of housing, tract housing, took over and the kit home business faded into oblivion. The Sears company itself has lately been fading into oblivion. Having declared bankruptcy in 2018, Sears barely exists at all now except in legal proceedings and a few remaining stores as its assets are slowly liquidated.

With no official list of where kit homes were built, a few fascinated enthusiasts hunt them down and share their findings through social media and Websites.

This report discusses where kit homes are located and offers clues as to where they are likely to be be found. It will look at things like street name, building site distance from railroads, economic factors and population characteristcs of areas.

My original goal was to create a computer program that could analyze a picture of a house and tell you if it was a Sears kit home, and moreover, its model name. This turns out to be a very hard problem due to the large number of models (around 370) that were produced over the years. Another goal was to have a computer program "crawl" through Google Street View images of houses and identify ones that had a high likelihood of being Sears kit homes. This turns out to be economically infeasible because Google charges 7/10ths of a cent every time my computer program used Street View to capture an image. Scanning 10,000 images of houses cost \$70.00.

Scanning 1 million images would have cost \$7,000.00.

This computer program was written in the Python language. Another software package called QGIS was used to prepare some of the data displayed in the maps. A list (provided by Lara) of around 13,000 confirmed and not-yet-confirmed kit home locations forms the basis of this analysis.

In [1]:
# Load Python modules needed for the analysis
import pandas as pd   # for dataframe manipulation
import numpy as np  # for numerical analysis
import matplotlib.pyplot as plt   # for generating plots and graphs
from matplotlib.pyplot import figure  # for modifying appearance of plots & graphs
import requests   # to make http post requests to the US Census geocoder
import io  # for working with I/O streams and allow conversion of geocode response to dataframe 
import csv  # reading/writing csv files
import pickle as pk # to store and retrieve dataframes on disk
import csv  # to read text files
import requests # to make http requests for data using census web API
import os   # to list contents of disk drive folders
import folium  # the mapping package
from folium import plugins  # to allow cluster markers on maps
/Users/mitchellfawcett/anaconda3/lib/python3.7/site-packages/pandas/compat/_optional.py:138: UserWarning: Pandas requires version '2.7.0' or newer of 'numexpr' (version '2.6.9' currently installed).
  warnings.warn(msg, UserWarning)
In [2]:
# Settings to improve the display of tabular results
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Part 1 - Visualize Kit Home Locations on a Map

This part of the analysis will be an interactive map of the US that shows the location of each kit home identified by hunters up through October 23, 2021.

In [3]:
# Read the Excel file of kit home locations into a Pandas dataframe.
address_df = pd.read_excel(r"Sears Roebuck Houses.xlsx", sheet_name = "Locations")

# Add a row number to each address.  A unique number for each row will be needed by the 
# US Census Bureau geocoder
address_df.insert(loc=0, column='row_num', value=np.arange(len(address_df)) + 2) 
# the +2 is to add 2 to each row number to account for the header row and row "0".
# I want the row_num value to be aligned wth the row number in the original Excel file.

# Remove the "?" from the Auth? column name.
address_df.rename(columns={"Auth?": "Auth"}, inplace = True)

# Examine some of the data
address_df.head()
Out[3]:
row_num Model Address City State Year Auth Added Notes Link #1 Link #2 Twp/Borough/Neighborhood County and State
0 2 Windsor 105 Meadow Brook Dr Clarks Summit PA NaN No AIM NaN NaN NaN NaN NaN
1 3 Columbine 11146 66 St NW Edmonton AB 1928.0 Yes LS Building permit. Canada NaN NaN NaN NaN
2 4 Belmont (old one) 541 Pine St Ketchikan AK NaN No LS NaN NaN NaN NaN NaN
3 5 Americus 303 E Samford Ave Auburn AL NaN No NaN NaN http://photos.al.com/alphotos/2014/09/alabamas... NaN NaN NaN
4 6 Elsmore 608 Brummel Ave Bridgeport AL NaN No NaN NaN NaN NaN NaN NaN
In [4]:
# Total number of locations
n = len(pd.unique(address_df["Address"]))
print ("Number of locations:", f'{n:,}')
Number of locations: 13,832
In [5]:
# Tidy up the values in the "Auth" column
# Change the "nan" to "N/A".
address_df["Auth"] = address_df["Auth"].replace(np.nan, 'N/A', regex=True)
# Make all the values in the Auth? column uppercase
address_df["Auth"] = address_df["Auth"].apply(lambda x: x.upper())
# Count the number of responses for Auth?
address_df.groupby("Auth").size()
Out[5]:
Auth
AIM       1
N/A       2
NO     8162
YES    5688
dtype: int64

Count the number of locations in each state

In [6]:
state_count = address_df['State'].value_counts() 

# Plot a barchart 
figure(figsize=(16, 6))
state_count.plot.bar()
plt.title("Number of Locations by State")
plt.show()

Ohio has the most kit home locations followed by Illinois, Pennsylvania and New York. Every state has at least one possible kit home location.

The next step (computer code omitted) is to "geocode" each of the addresses. This involves submitting an address to the US Census Bureau's geocoding service and receiving the address's longitude and latitude, census tract number and Zip Code. This takes around 20 minutes for the entire list of 13,000 addresses. The geocoding results get stored in a computer file so the process doesn't need to run again.

Once the coordinates of the addresses are known, the mapping process can begin.

In [7]:
# Retrieve the coordinates and other results of the geocoding that were previously stored in a computer file.
geocoded_results_df = pd.read_pickle('geocoded_results.pkl')

# Only keep rows that were successfully geocoded
geocoded_results_df = geocoded_results_df[geocoded_results_df["MATCH_INDICATOR"] == "Match"]

# Convert geography code values from numeric to string
geocoded_results_df['FIPS_STATE'] = geocoded_results_df['FIPS_STATE'].astype(int).astype(str)
geocoded_results_df['FIPS_COUNTY'] = geocoded_results_df['FIPS_COUNTY'].astype(int).astype(str)
geocoded_results_df['CENSUS_TRACT'] = geocoded_results_df['CENSUS_TRACT'].astype(int).astype(str)

# Left pad geograpgy values wit zeros
geocoded_results_df['FIPS_STATE'] = geocoded_results_df['FIPS_STATE'].apply('{:0>2}'.format)
geocoded_results_df['FIPS_COUNTY'] = geocoded_results_df['FIPS_COUNTY'].apply('{:0>3}'.format)
geocoded_results_df['CENSUS_TRACT'] = geocoded_results_df['CENSUS_TRACT'].apply('{:0>6}'.format)


# Create a unique geographic identifier by combining state, county and cenus tract code for each row.
geocoded_results_df["GeoID"] = geocoded_results_df["FIPS_STATE"] \
                                + geocoded_results_df["FIPS_COUNTY"] \
                                + geocoded_results_df["CENSUS_TRACT"]

# Split the LONG_LAT column into separate Longitude and Latitude columns
geocoded_results_df[['Longitude', 'Latitude']] = geocoded_results_df['LONG_LAT'].str.rsplit(',', 1, expand=True)

# Merge the Sears Kit Home style for each location from the original address list with the geocoded results.
mapping_data_df = pd.merge(left = address_df[['row_num','Model','Address','City','State','Auth']], 
                           right = geocoded_results_df, 
                           how = 'right', 
                           left_on = 'row_num',
                           right_on = 'ID')

# Examine some of the results
mapping_data_df.head()
Out[7]:
row_num Model Address City State Auth ID ADDRESS_IN MATCH_INDICATOR MATCH_TYPE ADDRESS_OUT LONG_LAT TIGER_EDGE STREET_SIDE FIPS_STATE FIPS_COUNTY CENSUS_TRACT CENSUS_BLOCK Zipcode GeoID Longitude Latitude
0 2 Windsor 105 Meadow Brook Dr Clarks Summit PA NO 2 105 Meadow Brook Dr, Clarks Summit, PA, Match Exact 105 MEADOW BROOK DR, CLARKS SUMMIT, PA, 18411 -75.71287,41.500095 139319156.0 R 42 069 110402 2006.0 18411 42069110402 -75.71287 41.500095
1 4 Belmont (old one) 541 Pine St Ketchikan AK NO 4 541 Pine St, Ketchikan, AK, Match Exact 541 PINE ST, KETCHIKAN, AK, 99901 -131.64699,55.344284 207096132.0 L 02 130 000300 2000.0 99901 02130000300 -131.64699 55.344284
2 5 Americus 303 E Samford Ave Auburn AL NO 5 303 E Samford Ave, Auburn, AL, Match Exact 303 E SAMFORD AVE, AUBURN, AL, 36830 -85.47823,32.59884 1569988.0 L 01 081 040300 2008.0 36830 01081040300 -85.47823 32.59884
3 6 Elsmore 608 Brummel Ave Bridgeport AL NO 6 608 Brummel Ave, Bridgeport, AL, Match Exact 608 BRUMMEL AVE, BRIDGEPORT, AL, 35740 -85.7156,34.947346 58044064.0 R 01 071 950200 1062.0 35740 01071950200 -85.7156 34.947346
4 7 Osborn 708 2nd St SE Cullman AL NO 7 708 2nd St SE, Cullman, AL, Match Exact 708 2ND ST SE, CULLMAN, AL, 35055 -86.83624,34.17931 130220321.0 R 01 043 964901 4032.0 35055 01043964901 -86.83624 34.17931

In Census Bureau speak, a GEOID is a string of numbers that uniquely identify the state, county and census tract number of a location. It is used to find out all kinds of information about an area such as population, income, jobs, ages and much more.

In [8]:
# Build a list containing all the coordinates so they be plotted on the map
locations = mapping_data_df[['Latitude', 'Longitude']]
locationlist = locations.values.tolist()
len(locationlist)

# An example of one point in the kit home location list
locationlist[7]
Out[8]:
['34.765038', '-87.70327']

Display the interactive map

The next computer code generates a USA map that shows the CONFIRMED (authenticated) kit home locations with DARK BLUE markers and the UNCONFIRMED (not authenticated) kit home locations with LIGHT BLUE markers.

Using the Map...

The thing that looks like a stack of square pancakes in the upper right corner of the map is a "layer control". You can use it to hide or show the confirmed and unconfirmed locations.

The + and - in the upper left of the map lets you zoom in and out.

Clicking on a numbered marker zooms in and separates a bigger group into smaller groups.

Once you zoom in far enough you will see individual markers that tag a single location. These are the markers with a little "i" in the center. If you click on one of these you will see the address. Something that is kind of fun to do is to highlight the address (just the address), right-click the highlight, and select "Search with Google". In most cases it will bring up a Street View page for the house. There is no charge for this sort of use of Street View.

In [18]:
{
    "tags": [
        "hide_input",
    ]
}

# Create a map using the Map() function and the coordinates of the locations of all the homes.

# Map starts out centered on Ohio.
mp = folium.Map(location=[40.367474, -82.996216], zoom_start=7, width=900, height=550, control_scale=True)
# Ohio_map

### Define functions to set the color of cluster markers. Confirmed and unconfirmed locations have 
### different colors.
# This sets the color for CONFIRMED locations clusters.
icon_create_function_confirmed = """
    function(cluster) {
    var childCount = cluster.getChildCount(); 
    /* 
    // comment: can have something like the following to modify the different cluster sizes....
    var c = ' marker-cluster-';

    if (childCount < 50) {
        c += 'large';
    } else if (childCount < 300) {
        c += 'medium';
    } else {
        c += 'small';
    }    
    
    // The marker-cluster-<'size'> gets passed in the "return new L.DivIcon()" function below.
    */
    
    return new L.DivIcon({ html: '<div><span style="background-color:darkblue;color:white;font-size: 20px;">' + childCount + '</span></div>', className: 'marker-cluster', iconSize: new L.Point(40, 30) });    
    }
    """

# This sets the color of UNCONFIRMEDlocation clusters.
icon_create_function_unconfirmed = """
    function(cluster) {
    var childCount = cluster.getChildCount(); 

    return new L.DivIcon({ html: '<div><span style="background-color:lightblue;color:black;font-size: 20px;">' + childCount + '</span></div>', className: 'marker-cluster', iconSize: new L.Point(40, 30) });    
    }
    """

# Feature groups allow customization of layer control labels so they don't have to say "macro blah...""
fg_confirmed = folium.FeatureGroup(name = 'Confirmed Locations', show = True)
mp.add_child(fg_confirmed)
fg_unconfirmed = folium.FeatureGroup(name = 'Unconfirmed Locations', show = True)
mp.add_child(fg_unconfirmed)

# Add the Marker clusters for confirmed and unconfirmed locations to feature group
marker_cluster_confirmed = plugins.MarkerCluster(icon_create_function = icon_create_function_confirmed).add_to(fg_confirmed)  
marker_cluster_unconfirmed = plugins.MarkerCluster(icon_create_function=icon_create_function_unconfirmed).add_to(fg_unconfirmed) 

# A function to choose a marker color depending on if the house is a confirmed kit house or not.
# The individual location markers use the same color as their cluster markers.
def getcolor(auth_val):
    if auth_val == 'YES':
        return ("darkblue", "Confirmed")
    return ("lightblue","Unconfirmed")

### Add a layer to the map shpwing Confirmed kit homes
# Loop through all ther location pairs.
for point in range(0, len(locationlist)):
    try:
        clr, status = getcolor(mapping_data_df["Auth"][point])
        if status == "Confirmed":
            folium.Marker(
                location = locationlist[point], 
                popup = status + " " + mapping_data_df['Model'][point] + ": " + mapping_data_df['ADDRESS_OUT'][point],            
                icon = folium.Icon(color = clr)
            ).add_to(marker_cluster_confirmed)

    except Exception:  # not all addresses could be geocoded so skip them if coordinates are missing
        pass
        
### Add a layer to the map showing Unconfirmed kit homes
for point in range(0, len(locationlist)):
    try:
        clr, status = getcolor(mapping_data_df["Auth"][point])
        if status == "Unconfirmed":
            folium.Marker(
                location = locationlist[point], 
                popup = status + " " + mapping_data_df['Model'][point] + ": " + mapping_data_df['ADDRESS_OUT'][point],            
                icon = folium.Icon(color = clr)
            ).add_to(marker_cluster_unconfirmed)

    except Exception:  # not all addresses could be geocoded so skip them if coordinates are missing
        pass
        
# add layer control to map (allows layer to be turned on or off)
folium.LayerControl().add_to(mp)

# Display the map
mp
Out[18]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Street Names More Likely to Have Kit Homes

The following analysis was done using just the locations within Ohio. This was done to speed up processing times and to make the results a little more manageable (and understandable.)

Use was made of a national database of addresses that is being developed by the US Department of Transportation to be used for emergency notifications and disaster recovery. See https://www.transportation.gov/gis/national-address-database/national-address-database-0. Not all states have completed their submissions, but Ohio is one that has.

In [10]:
# Build an Ohio list of kit house locationss
kitpoints_df = mapping_data_df.loc[(mapping_data_df['MATCH_INDICATOR'] != 'No_Match') & (mapping_data_df['State'] == 'OH'), ['Model', 'Latitude', 'Longitude', 'FIPS_STATE', 'FIPS_COUNTY', 'CENSUS_TRACT', 'GeoID']] # ['Model', 'Latitude', 'Longitude']]

# Save the Ohio points to a csv file so the data can be used by QGIS software later.
kitpoints_df.to_csv("OhioKitPoints.csv", sep = ",", header = True, index = False)

The next step assumes that I previously downloaded the street addresses for Ohio from the National Address Database and stored them in a CSV file.

In [11]:
# Extract the street name and address type of each address in the state.
column_lst = ["StreetName", "Addr_Type"]
nad_addressTypes_df = pd.read_csv("NAD_r7_Ohio.csv", usecols = column_lst)
# Count the occuraces of each address type
pd.DataFrame(nad_addressTypes_df.groupby(['Addr_Type']).size())
Out[11]:
0
Addr_Type
Commercial 286354
Educational 1898
Government 1413
Industrial 735
Other 22863
Residential 3501707
Unknown 854263

There are 3,501,707 addresses that are specifically identified as Residential in Ohio according to the NAD. I will use 3,501,707 as the denominator when calculating the proportion of residential addresses on each street name.

In [12]:
state_address_count = 3501707
In [13]:
# The NAD "StreetName" column provides a basic (without Drive, Street, Avenue, etc) street name for 
# each address, which is what I'll need in my street address frequency analysis.
nad_street_address_count_df = pd.DataFrame(nad_addressTypes_df.loc[nad_addressTypes_df['Addr_Type'] == 'Residential'].groupby(['StreetName']).size().sort_values(ascending=False))
In [14]:
# Getting a similar basic street name for each address in the kit home list is a little more complicated.

# Start with the full addresses
kithome_address_df = mapping_data_df.loc[mapping_data_df['State'] == 'OH', ("State", 'Address')]

# Remove anything that is not a word character or a space
myregex0 = r"[^\w\s]"
kithome_streetnames_ser0 = kithome_address_df["Address"].str.replace(myregex0, '', regex = True, case = False)

# Replace the leading number wih the word ""
myregex1 = r"(^\d+)"
kit_home_streetnames_ser1 = kithome_streetnames_ser0.str.replace(myregex1, '', regex = True, case = False)

# Remove N, E, S, W, St, Ave, Rd, Dr
myregex2 = r"( [nesw]\b)|( dr\b)|( st\b)|( rd\b)|( ave\b)|( pl\b)|( blvd\b)|( se\b)|( ne\b)"
kithome_streetnames_ser2 = kit_home_streetnames_ser1.str.replace(myregex2, '', regex = True, case = False)

num_kithomes_in_state = len(kithome_streetnames_ser2)

To keep things more managable focus on the 30 most frequently mentioned streets in the Ohio kit home list.

In [15]:
# Count the kit homes per street and take the top 30 streets.
kithome_top_30_streets_df = pd.DataFrame(kithome_streetnames_ser2.value_counts()).head(30)

# Give the counts column a name
kithome_top_30_streets_df.columns = ["Actual Kit Homes"]

# Give the index a name
kithome_top_30_streets_df.index.name = "StreetName"

# Make index upper case
kithome_top_30_streets_df.index = kithome_top_30_streets_df.index.str.upper()

# Strip whitespace off index values (the street names)
kithome_top_30_streets_df.index = kithome_top_30_streets_df.index.str.strip()
In [16]:
# Get the address count proportions for street names in state overall
ohio_address_count_by_street_df = nad_street_address_count_df

# Add a column heading
ohio_address_count_by_street_df.columns = ["AddressCount"]

# A# Add a column that shows the proportion of addresses for each street in the state 
ohio_address_count_by_street_df['Proportion'] = ohio_address_count_by_street_df['AddressCount'] / state_address_count
In [17]:
# Calculate the number of kits homes expected for each Ohio Street
ohio_address_count_by_street_df["Expected Kit Homes"] = round(ohio_address_count_by_street_df["Proportion"] * num_kithomes_in_state, 2)

# Merge the actual counts of kit homes by street name and compare to the expected number.
comparison3_df = ohio_address_count_by_street_df.join(kithome_top_30_streets_df, lsuffix='_State', rsuffix='_Kits')


comparison3_df[comparison3_df["Actual Kit Homes"] - comparison3_df["Expected Kit Homes"] > 0]
Out[17]:
AddressCount Proportion Expected Kit Homes Actual Kit Homes
StreetName
PARK 9700 0.002770 7.60 9.0
MAPLE 9136 0.002609 7.16 15.0
WASHINGTON 9030 0.002579 7.08 9.0
LINCOLN 8373 0.002391 6.56 9.0
CLEVELAND 6511 0.001859 5.10 16.0
BROADWAY 6103 0.001743 4.78 9.0
OAK 5713 0.001631 4.48 10.0
CHURCH 5176 0.001478 4.06 9.0
RIVER 4544 0.001298 3.56 13.0
FOREST 2986 0.000853 2.34 9.0
WARREN 2182 0.000623 1.71 12.0
CAMBRIDGE 1838 0.000525 1.44 17.0
WOOSTER 1721 0.000491 1.35 11.0
HUNTER 1103 0.000315 0.86 10.0
HILLSIDE 885 0.000253 0.69 12.0
WYOMING 855 0.000244 0.67 10.0
SEYMOUR 539 0.000154 0.42 8.0
SUTTON 472 0.000135 0.37 11.0
BEACON 415 0.000119 0.33 13.0
JOSEPH 391 0.000112 0.31 9.0
PONDVIEW 181 0.000052 0.14 9.0
CARTHAGE 145 0.000041 0.11 10.0
FOURTEENTH 137 0.000039 0.11 14.0
ELBERON 86 0.000025 0.07 9.0
KRYDER 49 0.000014 0.04 9.0
HOLLIBAUGH 48 0.000014 0.04 9.0
In [ ]: